Advanced Data Aggregation with Pandas Functions

pandas
dataframe
aggregation
functions
Master advanced data aggregation techniques in Pandas using built-in functions, custom functions, and lambda expressions. Learn to create powerful summary statistics and custom aggregations.
Author

Mohammed Adil Siraju

Published

September 21, 2025

Data aggregation is a fundamental operation in data analysis that allows you to summarize and analyze data by groups. This notebook covers:

Mastering these techniques will give you powerful tools for data summarization and analysis.

1. Setting Up Sample Data

Let’s create a sample dataset to demonstrate various aggregation techniques. We’ll work with categorical data and numerical values.

import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10,15,20,25,30]
}

df = pd.DataFrame(data)
df
Category Value
0 A 10
1 B 15
2 A 20
3 B 25
4 A 30

2. Built-in Aggregation Functions

Pandas provides many built-in aggregation functions that you can use with the agg() method. These are the most common summary statistics.

Sum Aggregation

Calculate the total sum of values for each category:

df.groupby('Category').agg({'Value':'sum'})
Value
Category
A 60
B 40

Mean Aggregation

Calculate the average value for each category:

df.groupby('Category').agg({'Value':'mean'})
Value
Category
A 20.0
B 20.0

Maximum Value Aggregation

Find the highest value in each category:

df.groupby('Category').agg({'Value':'max'})
Value
Category
A 30
B 25

3. Custom Aggregation Functions

Sometimes built-in functions aren’t enough. Pandas allows you to create custom aggregation functions using lambda expressions or named functions.

Lambda Functions for Custom Aggregation

Create a lambda function to calculate the range (max - min) for each category:

custom_agg = lambda x: x.max() - x.min() 
df
Category Value
0 A 10
1 B 15
2 A 20
3 B 25
4 A 30
df.groupby('Category').agg(custom_agg)
# or
df.groupby('Category').agg({'Value': custom_agg})
Value
Category
A 20
B 10

4. Multiple Aggregations

You can apply multiple aggregation functions at once to get comprehensive statistics for each group.

Applying Multiple Built-in Functions

Calculate count, sum, min, max, and mean for each category:

df.groupby('Category')['Value'].agg(['count', 'sum', 'min', 'max','mean'])
count sum min max mean
Category
A 3 60 10 30 20.0
B 2 40 15 25 20.0

5. Named Custom Functions

For more complex logic, you can define named functions and use them in aggregations.

Creating a Custom Mean Function

Define a function to calculate mean (demonstrating how custom functions work):

def custom_mean(values):
    return sum(values) / len(values)

df.groupby('Category')['Value'].agg(custom_mean)
Category
A    20.0
B    20.0
Name: Value, dtype: float64

Summary

Data aggregation is a powerful tool for summarizing and analyzing grouped data. In this notebook, you learned:

🔧 Built-in Functions

  • sum, mean, max: Standard statistical aggregations
  • Dictionary syntax: agg({'column': 'function'})
  • Multiple functions: agg(['func1', 'func2'])

🎯 Custom Functions

  • Lambda functions: Quick, inline custom logic
  • Named functions: Complex logic with reusable functions
  • Flexible application: Apply to specific columns or entire groups

💡 Key Concepts

  1. Dictionary Aggregation: Specify different functions for different columns
  2. List Aggregation: Apply multiple functions to the same column
  3. Custom Logic: Create domain-specific aggregations

🚀 Best Practices

  • Use built-in functions when possible (more efficient)
  • Lambda functions for simple custom logic
  • Named functions for complex, reusable operations
  • Choose appropriate aggregations based on your data and analysis goals

📊 Next Steps

  • Explore groupby with multiple columns
  • Learn about transformation and filtering operations
  • Practice with real datasets to create meaningful aggregations

Mastering aggregation functions will significantly enhance your data analysis capabilities! 🎯📈